Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP add config schema & generate HTML docs #107

Closed
wants to merge 1 commit into from

Conversation

jameshadfield
Copy link
Member

Very much an experimental PR.

I think checking configs against this schema (within Snakemake) is useful by itself. I find JSON schemas painful to write but they work. Writing them indentified a few omissions from #104 which was immediate positive feedback.

I'm less certain about the generated HTML docs, but they are essentially free given the investment in the schema. I've committed the HTML in this PR to make it easier to review / try out, but you can generate docs using json-schema-for-humans via:

generate-schema-doc --config template_name=js config/schema.yaml public/schema.html

# Before we validate the config against the schema, check to see if we've failed to provide one
if not len(config.keys()):
print("-"*80 + "\nNo config loaded!", file=sys.stderr)
print("Avian-flu is indented to be run from the snakefile inside a subdir "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
print("Avian-flu is indented to be run from the snakefile inside a subdir "
print("Avian-flu is intended to be run from the Snakefile inside a subdir "

oneOf:
- type: object
patternProperties:
"^[^_/]+/[^_/]+/[^_/]+$":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a nit? I'm not sure if it matters too much, but I think the intent of your wildcarding approach is that *, when it appears as an element, should appear alone, and that is not enforced by this regex — you'd need to replace the [^_/]+ bits with something more like (*|[^_/]+), I think?

@jameshadfield
Copy link
Member Author

@tsibley & I discussed a little bit what these docs might look like in an ideal world. TL;DR it wouldn't be the auto-generated ones here, although the schema's a good start.

If we imagine people will run workflows either with no adjustments or with config overlays (YAML and/or --config) the docs should give everything they need without having to read through the snakefile or base config YAML. The generated docs would be able to:

  • point to (or include) instructions for how to run nextstrain analyses
  • explain at a high level the steps taken by the pipeline (i.e. convey what it does)
  • list out the config options with some concept of importance (i.e. don't treat every top-level key the same)
  • include the defaults for each config value. There's subletly here for workflows with multiple base configfiles, but if we treat gisaid & h5n1-cattle-outbreak as separate workflows which happen to share a lot of the same code then it becomes easier to reason with.
  • include examples and or help where and as appropriate. For instance the expected metadata columns would be part of this.
  • linked to the correct workflow version (e.g. accessed via the CLI, either via public URL or local server)

If this were served locally (as HTML) you could imagine one day generating config overlays from it.

Generate docs using json-schema-for-humans:
```
generate-schema-doc --config template_name=js config/schema.yaml public/schema.html
```
# prototype based on <https://github.com/nextstrain/augur/blob/master/augur/validate.py>
from importlib import metadata
# <https://github.com/nextstrain/augur/issues/1358>
assert str(metadata.version("jsonschema")).startswith('3.'), "jsonschema must be version 3"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to change this since Augur 27.0.0 no longer supports v3

@victorlin victorlin deleted the branch james/refactor-inputs December 17, 2024 23:31
@victorlin victorlin closed this Dec 17, 2024
@jameshadfield
Copy link
Member Author

jameshadfield commented Dec 18, 2024

@victorlin why was this closed? I think it should still be an open (WIP/experimental) PR to aid big-picture discussions around schemas and workflow documentation

Oh, I think it was automatically closed after you (correctly) deleted the branch from the already-closed #106, and that branch was the base branch for this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants